Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SQL] Put check in ExpressionEncoder.fromRow to ensure we can convert deserialized object to required type #16546

Closed
wants to merge 3 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jan 11, 2017

What changes were proposed in this pull request?

Two problems are addressed in this patch.

  1. Serialize subclass of Seq[_] which doesn't have element type

Currently, in ScalaReflection.serializerFor, we try to serialize all sub types of Seq[_]. But for Range which is a Seq[Int] and doesn't have element type, serializerFor will fail and show mystery messages:

scala.MatchError: scala.collection.immutable.Range.Inclusive (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
  at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:520)
  at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:463)
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71) 

This patch tries to fix this by considering the types without element type.

  1. Encoder can't deserialize internal row to required type

We serialize the objects with common super class such as Seq[_] to a common internal data. But when we want to deserialize the internal data back to the original objects, we will encounter the problem of initialization of different types of objects.

For example, we deserialize the data serialized from Seq[_] to WrappedArray. It works when we serialize data of Seq[_]. If we try to serialize data of subclass of Seq[_] (for example Range) which is not assignable from WrappedArray, there will be runtime error when converting deserialized data to the required subclass of Seq[_].

Except for explicitly writing down the rule to deserialize each subclass of Seq[_], I think the feasible solution is to check if we can convert deserialized data to the required type. This patch puts the check into ExpressionEncoder.fromRow. Once the requirement is not matched, we show a reasonable message to users.

How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71192 has started for PR 16546 at commit 190fb62.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71196 has finished for PR 16546 at commit 190fb62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2017

Just noticed that there is a related and interesting pr #16240 merged recently.

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71218 has finished for PR 16546 at commit 0af969e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (!(t <:< localTypeOf[Seq[_]])) {
return None
}
val TypeRef(_, _, elementTypeList) = t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be better solved by using match and typeParams. Something like this:

t match {
  case TypeRef(_, _, Seq(elementType)) => Some(elementType)
  case _ =>
    t.baseClasses.find { c =>
      val cType = c.asClass.toType
      cType <:< localTypeOf[Seq[_]] && cType.typeParams.nonEmpty
    }.map(t.baseType(_).typeParams.head)
}

Also not sure whether types with more than one type parameter are handled correctly.

@viirya viirya closed this Feb 23, 2017
@viirya viirya deleted the encoder-range branch December 27, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants